Global Convergence of Two-Timescale Actor-Critic for Solving Linear Quadratic Regulator

نویسندگان

چکیده

The actor-critic (AC) reinforcement learning algorithms have been the powerhouse behind many challenging applications. Nevertheless, its convergence is fragile in general. To study instability, existing works mostly consider uncommon double-loop variant or basic models with finite state and action space. We investigate more practical single-sample two-timescale AC for solving canonical linear quadratic regulator (LQR) problem, where actor critic update only once a single sample each iteration on an unbounded continuous Existing analysis cannot conclude such case. develop new framework that allows establishing global to epsilon-optimal solution at most order of epsilon -2.5 complexity. our knowledge, this first finite-time LQR optimality. complexity improves those other variants by orders, which sheds light wisdom algorithms. also further validate theoretical findings via comprehensive simulation comparisons.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linear Off-Policy Actor-Critic

This paper presents the first actor-critic algorithm for o↵-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on actor-critic algorithms is limited to the on-policy setting and does not take advantage of the recent advances in o↵policy gradient temporal-di↵erence learning. O↵...

متن کامل

Global linear convergence of an augmented Lagrangian algorithm for solving convex quadratic optimization problems

We consider an augmented Lagrangian algorithm for minimizing a convex quadratic function subject to linear inequality constraints. Linear optimization is an important particular instance of this problem. We show that, provided the augmentation parameter is large enough, the constraint value converges globally linearly to zero. This property is viewed as a consequence of the proximal interpretat...

متن کامل

About One Sweep Algorithm for Solving Linear-Quadratic Optimization Problem with Unseparated Two-Point Boundary Conditions

In the paper a linear-quadratic optimization problem (LCTOR) with unseparated two-point boundary conditions is considered. To solve this problem is proposed a new sweep algorithm which increases doubles the dimension of the original system. In contrast to the well-known methods, here it refuses to solve linear matrix and nonlinear Riccati equations, since the solution of such multi-point optimi...

متن کامل

Temporal Difference Based Actor Critic Learning - Convergence and Neural Implementation

Actor-critic algorithms for reinforcement learning are achieving renewed popularity due to their good convergence properties in situations where other approaches often fail (e.g., when function approximation is involved). Interestingly, there is growing evidence that actor-critic approaches based on phasic dopamine signals play a key role in biological learning through cortical and basal gangli...

متن کامل

Global Optimization for Solving Linear Non-Quadratic Optimal Control Problems

This paper presents a global optimization approach to solving linear non-quadratic optimal control problems. The main work is to construct a differential flow for finding a global minimizer of the Hamiltonian function over a Euclid space. With the Pontryagin principle, the optimal control is characterized by a function of the adjoint variable and is obtained by solving a Hamiltonian differentia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i6.25865